Crate arrow_array

source ·
Expand description

The central type in Apache Arrow are arrays, which are a known-length sequence of values all having the same type. This crate provides concrete implementations of each type, as well as an Array trait that can be used for type-erasure.

Downcasting an Array

Arrays are often passed around as a dynamically typed &dyn Array or ArrayRef. For example, RecordBatch stores columns as ArrayRef.

Whilst these arrays can be passed directly to the compute, csv, json, etc… APIs, it is often the case that you wish to interact with the data directly.

This requires downcasting to the concrete type of the array:


fn sum_int32(array: &dyn Array) -> i32 {
    let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
    integers.iter().map(|val| val.unwrap_or_default()).sum()
}

// Note: the values for positions corresponding to nulls will be arbitrary
fn as_f32_slice(array: &dyn Array) -> &[f32] {
    array.as_any().downcast_ref::<Float32Array>().unwrap().values()
}

Additionally, there are convenient functions to do this casting such as cast::as_primitive_array<T> and cast::as_string_array:


fn as_f32_slice(array: &dyn Array) -> &[f32] {
    // use as_primtive_array
    as_primitive_array::<Float32Type>(array).values()
}

Building an Array

Most Array implementations can be constructed directly from iterators or Vec


Int32Array::from(vec![1, 2]);
Int32Array::from(vec![Some(1), None]);
Int32Array::from_iter([1, 2, 3, 4]);
Int32Array::from_iter([Some(1), Some(2), None, Some(4)]);

StringArray::from(vec!["foo", "bar"]);
StringArray::from(vec![Some("foo"), None]);
StringArray::from_iter([Some("foo"), None]);
StringArray::from_iter_values(["foo", "bar"]);

ListArray::from_iter_primitive::<Int32Type, _, _>([
    Some(vec![Some(1), None, Some(3)]),
    None,
    Some(vec![])
]);

Additionally ArrayBuilder implementations can be used to construct arrays with a push-based interface

// Create a new builder with a capacity of 100
let mut builder = Int16Array::builder(100);

// Append a single primitive value
builder.append_value(1);

// Append a null value
builder.append_null();

// Append a slice of primitive values
builder.append_slice(&[2, 3, 4]);

// Build the array
let array = builder.finish();

assert_eq!(
    5,
    array.len(),
    "The array has 5 values, counting the null value"
);

assert_eq!(2, array.value(2), "Get the value with index 2");

assert_eq!(
    &array.values()[3..5],
    &[3, 4],
    "Get slice of len 2 starting at idx 3"
)

Zero-Copy Slicing

Given an Array of arbitrary length, it is possible to create an owned slice of this data. Internally this just increments some ref-counts, and so is incredibly cheap

let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;

// Slice with offset 1 and length 2
let sliced = array.slice(1, 2);
let ints = sliced.as_any().downcast_ref::<Int32Array>().unwrap();
assert_eq!(ints.values(), &[2, 3]);

Internal Representation

Internally, arrays are represented by one or several Buffer, the number and meaning of which depend on the array’s data type, as documented in the Arrow specification.

For example, the type Int16Array represents an array of 16-bit integers and consists of:

  • An optional Bitmap identifying any null values
  • A contiguous Buffer of 16-bit integers

Similarly, the type StringArray represents an array of UTF-8 strings and consists of:

  • An optional Bitmap identifying any null values
  • An offsets Buffer of 32-bit integers identifying valid UTF-8 sequences within the values buffer
  • A values Buffer of UTF-8 encoded string data

Re-exports

Modules

Macros

Structs

Traits